ONS / NISR
2021
Plotly is an open source graphing library which allows users to easily create highly customisable interactive and static charts in JavaScript, python or R.
Plotly was built using Python and the Django framework, with a front end using JavaScript and the visualization library D3.js, HTML and CSS.
Plotly allows for the creation of professional looking charts for both online and offline reports
Plotly.express is a high-level API for plotly.py, which allows users to create charts quickly without have to dig into the plotly figure objects and understand the components. Plotly.express for python is similar to the standard plotly library for R in terms of "grammar" and relies on being fed "tidy" (long) data.
Use of the higher-level plotly.express and the standard plotly libraries have their pros and cons and different users will prefer one from the other. The beauty is that any plotly figure (regardless of the language in which it was created) can be exported to JSON (JavaScript Object Notation) and used to recreate the same chart in any compatible language for editing etc. in very few steps.
Plotly can be installed using pip. Remember to restart the kernel once the install has completed!
!pip install -U plotly
Requirement already satisfied: plotly in /opt/miniconda3/lib/python3.9/site-packages (5.3.1)
Collecting plotly
Downloading plotly-5.5.0-py2.py3-none-any.whl (26.5 MB)
|████████████████████████████████| 26.5 MB 1.7 MB/s eta 0:00:01
Requirement already satisfied: six in /opt/miniconda3/lib/python3.9/site-packages (from plotly) (1.16.0)
Requirement already satisfied: tenacity>=6.2.0 in /opt/miniconda3/lib/python3.9/site-packages (from plotly) (8.0.1)
Installing collected packages: plotly
Attempting uninstall: plotly
Found existing installation: plotly 5.3.1
Uninstalling plotly-5.3.1:
Successfully uninstalled plotly-5.3.1
Successfully installed plotly-5.5.0
First lets import some data to play with using the cars.csv in the data folder.
import pandas as pd
cars = pd.read_csv('./data/cars.csv')
cars
| Make | Model | Type | Origin | DriveTrain | MSRP | Invoice | EngineSize | Cylinders | Horsepower | MPG_City | MPG_Highway | Weight | Wheelbase | Length | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Acura | MDX | SUV | Asia | All | 36945 | 33337 | 3.5 | 6.0 | 265 | 17 | 23 | 4451 | 106 | 189 |
| 1 | Acura | RSX Type S 2dr | Sedan | Asia | Front | 23820 | 21761 | 2.0 | 4.0 | 200 | 24 | 31 | 2778 | 101 | 172 |
| 2 | Acura | TSX 4dr | Sedan | Asia | Front | 26990 | 24647 | 2.4 | 4.0 | 200 | 22 | 29 | 3230 | 105 | 183 |
| 3 | Acura | TL 4dr | Sedan | Asia | Front | 33195 | 30299 | 3.2 | 6.0 | 270 | 20 | 28 | 3575 | 108 | 186 |
| 4 | Acura | 3.5 RL 4dr | Sedan | Asia | Front | 43755 | 39014 | 3.5 | 6.0 | 225 | 18 | 24 | 3880 | 115 | 197 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 423 | Volvo | C70 LPT convertible 2dr | Sedan | Europe | Front | 40565 | 38203 | 2.4 | 5.0 | 197 | 21 | 28 | 3450 | 105 | 186 |
| 424 | Volvo | C70 HPT convertible 2dr | Sedan | Europe | Front | 42565 | 40083 | 2.3 | 5.0 | 242 | 20 | 26 | 3450 | 105 | 186 |
| 425 | Volvo | S80 T6 4dr | Sedan | Europe | Front | 45210 | 42573 | 2.9 | 6.0 | 268 | 19 | 26 | 3653 | 110 | 190 |
| 426 | Volvo | V40 | Wagon | Europe | Front | 26135 | 24641 | 1.9 | 4.0 | 170 | 22 | 29 | 2822 | 101 | 180 |
| 427 | Volvo | XC70 | Wagon | Europe | All | 35145 | 33112 | 2.5 | 5.0 | 208 | 20 | 27 | 3823 | 109 | 186 |
428 rows × 15 columns
Traditionally plotly express is imported as the alias px using the below code.
import plotly.express as px
This gives us access to many functions for creating different charts. Each as its own set of arguments that be altered to change the appearance and behavior of the charts that are produced.
The following code generates a simple x-y scatter plot using the plotly.express.scatter() method.
The first argument of this function is the pandas dataframe which holds the data we wish to visualize (in our case cars).
The arguments x and y then take as their values strings matching the column name in the dataframe that we want to plot as the x and y coordinates, respectively. In this case we want to plot Horsepower against MPG_City.
# assign the created scatter plot to the object 'scatter'
scatter = px.scatter(cars, # dataset object
x = 'Horsepower', # variable to show on x axis
y = 'MPG_City', # variable to show on y axis
)
# display the scatter plot inline within the notebook referencing name as last item to appear in cell
scatter
That's was easy, and it seems that more Horsepower results in a lower MPG_city...but we may want to change colors of points and sizes of points based on a variable in the dataframe to gain further insight.
scatter = px.scatter(cars,
x = 'Horsepower',
y = 'MPG_City',
color = 'Type', # color by 'Type', also creates a legend
size = 'EngineSize', # size points by 'EngineSize'
title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
)
scatter
Creating a line chart is just as easy as creating a scatter chart, we just instead use the function plotly.express.line()
line = px.line(cars[cars['Type']=='SUV'].sort_values('Horsepower'), # dataset object
x = 'Horsepower', # variable to show on x axis
y = 'MPG_City', # variable to show on y axis
color = 'Origin'
)
line
We can also create bar charts using the plotly.express.bar() method just as easily...
It is important to note that each method may have a few arguments unique to that method, so simply changing the scatter to bar in the method call may result in an error being thrown. It may work though!
bar = px.bar(cars,
x = 'Origin',
y = 'MSRP',
color = 'Type',
barmode = 'group', # this is a new argument, it tells plotly to stack all the values of 'MSRP'
title = 'My first px bar plot!'
)
bar
This shows what we want but it looks a bit funny as all the individual values are stacked, and the hover label reflects this, showing only the closest value to be plotted rather than the total MSRP, for example.
We can do a little better with the plotly.express.histogram() method to aggregate the data by the groups and plot the total (or average, or min or max) MSRP of each vehicle type by region of manufacture.
hist = px.histogram(cars,
x = 'Origin',
y = 'MSRP',
histfunc = 'sum', # sum the MSRP by color group and split by x
color = 'Type',
barmode = 'group',
title = 'Total MSRP of vehicle types by region of manufacture'
)
hist
Pie charts aren't loved by everyone, but they're simple enough to create using plotly. Examples can be found at https://plotly.com/python/pie-charts/
We'll use a simple fake dataset to illustrate the use of pie charts
pie_data = pd.DataFrame({'Category': ['Research','Teaching','Estates','Support','Climate'],
'Expenditure': [4500, 2500, 1000, 500, 500]})
pie = px.pie(pie_data, labels='Category', values='Expenditure', hole=0)
pie
Using the dataset cars create a scatter plot of EngineSize vs Horsepower. Make the size of the point proportional to the weight of the car and colour the point based on which part of the world the car was built in.
Create a bar chart showing the average Horsepower by car Type. Split an colour the bars by the Origin of the car.
1.3.1 Find the total weight of the cars in each part of the world1.3.2 Create a donut chart of these total weights, "pull out" the slice of the donut associated with the USA. Hint: The documentation for the px.pie() function will be helpful here.
We've seen that plotly charts can be styled in lots of different ways. However in most instances it would be nice if we could just have plotly figure out all the appropriate styles for us. That way we don't have to worry about manually styling every chart that we make.
Enter nisr_style
nisr_style is a Python library that can be imported that with automatically style all the charts that you make. It will insure that everyone produces charts that look the same without having to manually update the styles!
nisr_style only exists on NISR's github repository. We can install it but we need to have access to that repository, fortunately you all should.
nisr_style can be installed like any other python package using pip
!pip install git+https://github.com/NISR-analysis/ds-styleguide.git
Note that we need to use git+ and then the URL of the repository that contains nisr_style.
nisr_style¶Once we've installed the library we just need to import nisr_style and any plotly chart that we make will be in the house style.
import pandas as pd
import plotly.express as px
import nisr_style
cars = pd.read_csv('./data/cars.csv')
scatter = px.scatter(cars,
x = 'Horsepower',
y = 'MPG_City',
color = 'Type', # color by 'Type', also creates a legend
size = 'EngineSize', # size points by 'EngineSize'
title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
)
scatter
This library will automatically theme all the different chart types available in plotly, for example here is our pie chart again.
cars = pd.read_csv('./data/cars.csv')
avg_weight = cars.groupby('Origin')[['Weight']].sum().reset_index()
pie = px.pie(avg_weight, names='Origin', values='Weight', hole=0.5)
pie.update_traces(pull=[0,0,0.3])
pie
Its possible to export any plotly chart as a static image such as a .png or .pdf for use in other documents by using the method write_image
import pandas as pd
import plotly.express as px
import nisr_style
cars = pd.read_csv('./data/cars.csv')
scatter = px.scatter(cars,
x = 'Horsepower',
y = 'MPG_City',
color = 'Type', # color by 'Type', also creates a legend
size = 'EngineSize', # size points by 'EngineSize'
title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
)
scatter.write_image('./scatter.png', scale=4)
Saving a plotly chart as a static image means that we lose interactivity. It is possible to save a chart with the interactivity. This can be useful if you want to send the chart to someone, but don't want them to have to run a notebook in order to see it. To do this we use the write_html method.
scatter = px.scatter(cars,
x = 'Horsepower',
y = 'MPG_City',
color = 'Type', # color by 'Type', also creates a legend
size = 'EngineSize', # size points by 'EngineSize'
title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
)
scatter.write_html('./scatter.html')